Search CORE

59 research outputs found

Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning

Author: Norinder U
Spjuth O
Svensson F
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/10/2021
Field of study

Confidence predictors can deliver predictions with the associated confidence required for decision making and can play an important role in drug discovery and toxicity predictions. In this work we investigate a recently introduced version of conformal prediction, synergy conformal prediction, focusing on the predictive performance when applied to bioactivity data. We compare the performance to other variants of conformal predictors for multiple partitioned datasets and demonstrate the utility of synergy conformal predictors for federated learning where data cannot be pooled in one location. Our results show that synergy conformal predictors based on training data randomly sampled with replacement can compete with other conformal setups, while using completely separate training sets often results in worse performance. However, in a federated setup where no method has access to all the data, synergy conformal prediction is shown to give promising results. Based on our study, we conclude that synergy conformal predictors are a valuable addition to the conformal prediction toolbox

UCL Discovery

Using Predicted Bioactivity Profiles to Improve Predictive Modeling

Author: Norinder U
Spjuth O
Svensson F
Publication venue
Publication date: 06/05/2020
Field of study

Predictive modeling is a cornerstone in early drug development. Using information for multiple domains or across prediction tasks has the potential to improve the performance of predictive modeling. However, aggregating data often leads to incomplete data matrices that might be limiting for modeling. In line with previous studies, we show that by generating predicted bioactivity profiles, and using these as additional features, prediction accuracy of biological endpoints can be improved. Using conformal prediction, a type of confidence predictor, we present a robust framework for the calculation of these profiles and the evaluation of their impact. We report on the outcomes from several approaches to generate the predicted profiles on 16 datasets in cytotoxicity and bioactivity and show that efficiency is improved the most when including the p-values from conformal prediction as bioactivity profiles

UCL Discovery

Progress on an open source computer-assisted structure elucidation suite (SENECA)

Author: C Steinbeck
Christoph Steinbeck
O Spjuth
S Kuhn
Stefan Kuhn
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Computational toxicology using the OpenTox application programming interface and Bioclipse

Author: A Ruttenberg
A Splendiani
B Hardy
Barry Hardy
C Steinbeck
CA Goble
CR Williams-DeVane
DW Huang
E Prud'hommeaux
E Willighagen
Egon L Willighagen
EL Willighagen
European Parliament C
G Patlewicz
H Ogata
J Bhagat
JJ Carroll
L Chepelev
N Jeliazkova
Nina Jeliazkova
O Spjuth
O Spjuth
O Spjuth
O Spjuth
Ola Spjuth
P Rydberg
R Diderichs
Roland C Grafström
T Kelder
T Oinn
TB Knudsen
U Schmidt
W3C OWL Working Group
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

BACKGROUND: Toxicity is a complex phenomenon involving the potential adverse effect on a range of biological functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational methods to generate a set of predictive models. Such models rely strongly on being able to integrate information from many sources. The required integration of biological and chemical information sources requires, however, a common language to express our knowledge ontologically, and interoperating services to build reliable predictive toxicology applications. FINDINGS: This article describes progress in extending the integrative bio- and cheminformatics platform Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user. The integration takes advantage of semantic web technologies, thereby providing an open and simplifying communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable integration of toxicity information from both experimental and computational sources. CONCLUSIONS: A novel computational toxicity assessment platform was generated from integration of two open science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable toxicology data and computational services. The combination provides improved reliability and operability for handling large data sets by the use of the Open Standards from the OpenTox Application Programming Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

PubMed Central

VTT Research System

Bioclipse-R: integrating management and visualization of life science data with statistical analysis

Author: Alvarsson J.
Berg A.
Carlsson L.
Eklund M.
Georgiev V.
Spjuth O.
Wikberg J. E.
Willighagen E.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

SUMMARY: Bioclipse, a graphical workbench for the life sciences, provides functionality for managing and visualizing life science data. We introduce Bioclipse-R, which integrates Bioclipse and the statistical programming language R. The synergy between Bioclipse and R is demonstrated by the construction of a decision support system for anticancer drug screening and mutagenicity prediction, which shows how Bioclipse-R can be used to perform complex tasks from within a single software system. Availability and implementation: Bioclipse-R is implemented as a set of Java plug-ins for Bioclipse based on the R-package rj. Source code and binary packages are available from https://github.com/bioclipse and http://www.bioclipse.net/bioclipse-r, respectively. CONTACT: [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Maastricht University Research Portal

The C1C2: A framework for simultaneous model selection and assessment

Author: A Golbraikh
A Kontijevskis
AE Hoerl
B Efron
B Efron
C Hansch
D Wolpert
DE Goldberg
DL Selwood
E Amaldi
E Freyhult
EE Ntzani
G Schwarz
H Akaike
H Kubinyi
H Shimodaira
J Cartmell
J Cartmell
J Kuha
J Shao
Jarl ES Wikberg
JES Wikberg
L Wasserman
LJ van't Veer
M Skurichina
M Stone
Martin Eklund
O Nicolotti
O Obrezanova
O Spjuth
Ola Spjuth
P Burman
R Tibshirani
R Todeschini
RE Kass
S Michiels
SJ Cho
SR Johnson
T Hastie
TR Hvidsten
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Ontology of core data mining entities

Author: A Bernstein
A Golbraikh
A Karalic
B Smith
B Smith
B Smith
C Silla
C Vens
D Demšar
D Kocev
D Kocev
D Qi
D Young
DJ Hand
F Serban
G Madjarov
G Tsoumakas
GH Bakir
H Mannila
HP Kriegel
I Slavkov
J Vanschoren
K Button
Larisa Soldatova
LN Soldatova
M Courtot
M Ford
M Žáková
MA Avery
MA Avery
MF López
O Spjuth
P Robinson
Panče Panov
Q Yang
R Caruana
R Guha
R Guha
RD King
RD King
RR Brinkman
Sašo Džeroski
T Dietterich
V Podpečan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/07/2014
Field of study

In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

Crossref

Brunel University Research Archive

XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services

Author: A Labarga
AR Jones
B Wallner
BioMoby Consortium
C Steinbeck
C Steinbeck
D Smedley
E Jain
E Willighagen
Egon L Willighagen
EW Sayers
GL Holliday
H Stockinger
H Sugawara
Jarl ES Wikberg
Johannes Wagener
L Stein
LM Vaquero
M Hucka
M Lapins
MA Larkin
MD Wilkinson
MWEJ Fiers
N Adams
O Spjuth
Ola Spjuth
P Fisher
P Murray-Rust
PBT Neerincx
R Kottmann
RD Dowell
S Hoon
S Hunter
S Kaarthik
S Kerrien
S Kuhn
S Miyazaki
T Oinn
UniProt Consortium
X Dong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. Results: We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. Conclusion: XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

PubMed Central

Open Access LMU

An eScience-Bayes strategy for analyzing omics data

Author: A Gelman
A Gelman
A Isaksson
BP Carlin
C Desmedt
C Sotiriou
CF Taylor
CN Chi
CP Robert
D Milburn
D Muthas
D Talavera
EC Butcher
EL Kaplan
H Chuang
H Daumé III
HB Mann
HM Berman
Jarl ES Wikberg
JO Berger
JR Chen
L Ein-Dor
L Xu
LD Miller
M Xiao-Li
MA Stiffler
Martin Eklund
N Sha
O Spjuth
Ola Spjuth
P Murray-Rust
P Prusis
PCG da Costa
R Development Core Team
R Edgar
R Tonikian
RG Smock
RL Ho
S Gianni
S Lockless
S Michiels
SR Eddy
U Wickenberg-Bolin
Y Pawitan
Y Wang
Z Kutalik
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The omics fields promise to revolutionize our understanding of biology and biomedicine. However, their potential is compromised by the challenge to analyze the huge datasets produced. Analysis of omics data is plagued by the curse of dimensionality, resulting in imprecise estimates of model parameters and performance. Moreover, the integration of omics data with other data sources is difficult to shoehorn into classical statistical models. This has resulted in <it>ad hoc </it>approaches to address specific problems. Results We present a general approach to omics data analysis that alleviates these problems. By combining eScience and Bayesian methods, we retrieve scientific information and data from multiple sources and coherently incorporate them into large models. These models improve the accuracy of predictions and offer new insights into the underlying mechanisms. This "eScience-Bayes" approach is demonstrated in two proof-of-principle applications, one for breast cancer prognosis prediction from transcriptomic data and one for protein-protein interaction studies based on proteomic data. Conclusions Bayesian statistics provide the flexibility to tailor statistical models to the complex data structures in omics biology as well as permitting coherent integration of multiple data sources. However, Bayesian methods are in general computationally demanding and require specification of possibly thousands of prior distributions. eScience can help us overcome these difficulties. The eScience-Bayes thus approach permits us to fully leverage on the advantages of Bayesian methods, resulting in models with improved predictive performance that gives more information about the underlying biological system.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

OpenChrom: a cross-platform open source software for the mass spectrometric analysis of chromatographic data

Author: A Davies
A Savitzky
BN Colby
C Steinbeck
CGCS Horstmann
FW McLafferty
H Damen
HW Kong
J Taylor
JA Falkner
JE Biller
JE Biller
JM Halket
Juergen Odermatt
M Sturm
O Spjuth
P Hindmarch
PGA Pedrioli
Philip Wenig
RG Dromey
SE Stein
SY Loh
W Windig
W Windig
WG Pool
WG Pool
Y Cao
ZB Alfassi
ZB Alfassi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Today, data evaluation has become a bottleneck in chromatographic science. Analytical instruments equipped with automated samplers yield large amounts of measurement data, which needs to be verified and analyzed. Since nearly every GC/MS instrument vendor offers its own data format and software tools, the consequences are problems with data exchange and a lack of comparability between the analytical results. To challenge this situation a number of either commercial or non-profit software applications have been developed. These applications provide functionalities to import and analyze several data formats but have shortcomings in terms of the transparency of the implemented analytical algorithms and/or are restricted to a specific computer platform. Results This work describes a native approach to handle chromatographic data files. The approach can be extended in its functionality such as facilities to detect baselines, to detect, integrate and identify peaks and to compare mass spectra, as well as the ability to internationalize the application. Additionally, filters can be applied on the chromatographic data to enhance its quality, for example to remove background and noise. Extended operations like do, undo and redo are supported. Conclusions OpenChrom is a software application to edit and analyze mass spectrometric chromatographic data. It is extensible in many different ways, depending on the demands of the users or the analytical procedures and algorithms. It offers a customizable graphical user interface. The software is independent of the operating system, due to the fact that the Rich Client Platform is written in Java. OpenChrom is released under the Eclipse Public License 1.0 (EPL). There are no license constraints regarding extensions. They can be published using open source as well as proprietary licenses. OpenChrom is available free of charge at <url>http://www.openchrom.net</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central